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(57) ABSTRACT 

A microprocessor with an efficient and powerful coprocessor 
interface architecture is provided. The microprocessor has a 
set of generic coprocessor instructions on its instruction map 
and interface signals dedicated to the coprocessor interface. 
Depending on which coprocessor is interfaced to the 
microprocessor, the generic .coprocessor instructions are 
renamed to the specific coprocessor commands. When a 
coprocessor instruction for a specific function is fetched and 
decoded by the host processor, the appropriate command is 
issued through the coprocessor interface signals to the 
coprocessor and the coprocessor performs the required 
tasks. Hence, the coprocessor interfaced with the host pro- 
cessor need not have its own program. The pipelined opera- 
tions of the coprocessor are synchronized with pipelined 
operations of the host processor, 
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DATA PROCESSING SYSTEM AND METHOD such design represents savings in cost, power consumption, 

FOR PERFORMING ENHANCED PIPELINED system design complexity, and system reliability as com- 

OPERATIONS ON INSTRUCTIONS FOR pared to designs having two or more separate processors. 

NORMAL AND SPECIFIC FUNCTIONS A simple solution to an integration of, for example, a 

5 microprocessor core and a DSP core is to put the two 

nCT ~ ~ r IMycxmnKT independent cores on a single die (i.e., a microprocessor core 

HELD OF THE INVENTION for conUQ{ t asks and a DSP corc for sig nd processing 

The present invention relates to a data processing system algorithms ), This-simple_tworCore, appjft ^ 

with a host processor and a coprocessor, and more particu- CffiP-designejgJwjt h.a-flexibuity^ application^ 

larly to a pipelined data processing system having a host 10 specifijj)jifjo f micropro cessor,cores„and.DSP_cpre^to, fit_aj 

processor and a coprocessor which are implemented on a Gl^iapphclition.optimallyrjrhis approach, however, suffers 

single chip and method of interfacing between the proces- from several drawbacks: (1) III Programmability, because 

sors for performing enhanced pipelined operations. the cores should have their own programs and data; (2) 

Communication Overhead, because resource conflicts, false 

BACKGROUND OF THE INVENTION 15 data dependencies and deadlocks need to be prevented 

With an explosive growth in the market of portable through a complex scheme; and (3) Hardware Overhead due 

electronic products, technological emphasis in VLSI (Very to the duplicated part of the two cores, which results in 

Large Scale Integration) circuit design is shifting away from increased hardware cost and power inefficiency, 

high speed to low power. However, high-speed functions are Another way to support microprocessor and DSP capa- 

still indispensable for a microprocessor (or microcontroller) 20 bilities on a chip is to use a single processor having both the 

performing complex mathematical computation operations, capabilities, for example, a microprocessor with DSP capa- 

for example, multiplications. Such need for speed becomes bilities or a DSP unit with powerful bit manipulation and 

more pronounced in RISC (Reduced Instruction Set branch capabilities. 

Computer) type processors, DSP (Digital Signal Processing) \ n general, a microprocessor is a necessary element in 

units, and graphic accelerators because these devices have 25 electronic products; therefore, there are motivations on the 

increased demand for multimedia applications. par t 0 f designers to integrate a SOC design around a 

As demand grows for enhanced performance of microprocessor. Compared with the two-core approach, the 

microprocessor-based data processing systems, more com- SOC approach can achieve efficient communications 

plex techniques have been developed and used in micropro- 3Q between a microprocessor (or a host processor) and its 

cessor designs. For example, pipelined data processing interfaces, for example, coprocessors. <gy -equipping- the 

techniques such as division of processor operations into micro pjocessor-wijh -DSP-coprocessonmstrucdons an 

multiplicity of elementary suboperations are employed. fa cj-scheme rcontroFfuncti^ 

With reference to FIGS. 1A and IB, an execution of an im plemented on a smgle-p rocessorxhip-which-alsp-provides 

instruction, requiring time T for the execution in a non- 35 a^smgle-de^lojraentxnv^ 

pipelined mode of operation, is divided into a plurality of has otherr advantages- over-the~two -co re~app roach. ^For_^ 

suboperation stages in a pipelined mode of operation. For exa ^jexjlSP^pro gnm^^ by using— 

example, three-stage pipelined mode of operation typically coprocessor-mstmctionsTof:a:host;processor, and hardware 

has three suboperation stages, such as ST1, ST2, and ST3. cost can~berre^uced:because:there is no hardware duplica- 

A processor in a pipelined mode is partitioned in such a 4Q tion: ~ J 

manner that each suboperation of a pipelined instruction is The overall processing efficiency in such a host- 
completed in a predefined stage time period T: As a result coprocessor SOC architecture is a function of a number of 
of such partitions on an instruction, an execution of such factors: for example, the computing capability of a copro- 
pipelined instruction requires three stage time periods 3T*, cessor and the information exchange capability between a 
which is longer than time T required for an execution of an 45 host processor and a coprocessor. The computing capability 
instruction in a non-pipelined mode of operation. of a coprocessor depends upon how many instructions the 
In a pipelined mode of operation in which a processor coprocessor has and how fast the coprocessor executes each 
separately executes each suboperation by partitioning a instruction. Such features of a coprocessor are knowable by 
pipelined instruction, however, a processing of a pipelined its specification. Thus, an improvement of a coprocessor 
instruction can be initiated after a stage time period V rather 50 performance can be achieved by using, within cost limits, a 
than after a time period T as in the non-pipelined mode of coprocessor with specification of desired features. On the 
operation. Since a stage time period V for an execution of other hand, the information exchange capability between a 
each suboperation of a pipelined instruction is shorter than host processor and a coprocessor is affected by coprocessor 
time T for an execution of an non-pipelined instruction, the interface protocols of a host processor, rather than a copro- 
execution of an instruction in a pipelined mode of operation 55 cessor performance. 

can be expedited. A stage time period T* can be chosen as In such conventional host-coprocessor SOC techniques, 

small as possible consistent with the number of suboperation however, in order to improve the coprocessor capabilities., 

stages in a pipelined mode of operation unit. more powerful coprocessor instructions with appropriate 

Recent advancements in VLSI technology have made data paths needed to be added to a host processor. Such 

DSP technology readily available, so that it is not difficult to go design is tantamount to a new processor chip. If there is a 

find electronic products equipped with some form of mul- bottleneck in the information exchange between the host and 

timedia DSP capability. Many consumer electronic products coprocessor, the system performance will not be improved, 

with multimedia DSP capability have a microprocessor chip Hereinafter, an example of such a bottleneck in the infor- 

for the control and I/O operations and a separate DSP chip mation exchange will be explained, 

for signal processing, 65 FIG. 2 is a timing diagram showing pipelined executions 

A SOC (System-On-a-Chip) approach is attracting atten- of three subsequent instructions II, 12, and 13 in a typical 

tion of chip designers (particularly, ASIC designers) because RISC-based host-coprocessor system. Each instruction It, 
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12 or 13 of a RISC instruction pipeline, so-call three-stage instruction when the fetched instruction is identified as a 

pipeline, has three stages: Instruction Fetch (IF), Instruction coprocessor instruction during an instruction fetch (IF) cycle 

Decode (ID), and Execution (EX) stages. Each of the three of the instruction, and at least one coprocessor for perform- 

stages IF, ID and EX for an instruction is intended to be ing additional specific functions. The microprocessor (host 

completed in a single cycle of a clock signal CLK. 5 processor) issues to the coprocessor a coprocessor command 

For the purpose of explanation, in FIG. 2, a first instruc- corresponding to the fetched instruction. The coprocessor 

tion II is assumed to be a host processor instruction for an f co f* the C ° pr ° CeSS ° / ! in C °~ i" ? nS | rUC [! 0n 

- , 4 r . . j decode/memory access (ID/MEM) cycle of the instruction 

execution of a host processor operation, and second and , . .u j j j ^ j • 

.... . r . T _ r ' . . _ and executes the decoded coprocessor command during an 

third instructions 12 and 13 are coprocessor instructions for instruction execut ion (EX) cycle of the fetched instruction, 

execution of coprocessor operations. The first instruction II 1° j- * c j uj- *r*u * 

* . . r . . , . . According to a preferred embodiment of the present 

is ready to be executed by the host processor alone without invention( th 6 e host p ^ ocessor generates a pluralily J^yo- 

coprocessor interfacing, and the second and third instruc cessof imerface sj k ( ^ B c and D) when , he 

tions 12 and 13 are intended to be executed by the copro- fctched instnlction ^ identified „ , instruction, 

cesser responsive to coprocessor commands 12' and 13' Through the coprocessor interface signals the host processor 

(corresponding tc , instructions 12 lane 113 .respectively) and 35 ^ c 0C6SS0r corresponding to the 

coprocessor interface signals INF which are issued by the coprocessor instruction. The coprocessor provides its status 

host processor depending on results of decoding the copro- daU [Q (he hos , processor after executing the coprocessor 

cessor instructions 12 and 13. command in the EX cycle of the instruction. 

Referring to FIG. 2, first the host processor instruction II A data memory is commonly connected to both the host 

is fetched during cycle TO. That is, the instruction II ks processor and the coprocessor. The coprocessor accesses the 

loaded from a program memory into the host processor. In data memory only a , time designated by the host processor, 

the next cycle Tl, the instruction II is decoded therein and and during which the host processor is guarante ed not to 

at the same time the coprocessor instruction 12 is fetched. access lhe da(a memory . ^ internal clock gen6ral i on circu it 

The host processor instruction U is executed by host pro- ^ provided for lhe host proce ssor and the coprocessor. The 

cessor dunng cycle T2, in which the coprocessor instruc- dock generati o n cirCT1 it generates internal clock 

tions 12 and 13 are simultaneously decoded and fetched, signals synchronized with an externa i c i ock signal . The host 

respectively. Dunng cycle T3, the host processor issues the processor generates the coprocessor interface signals, syn- 

coprocessor command 12' corresponding the instruction 12 chr0 nizing with one of the internal clock signals, 

and also produces coprocessor interface signals INF for the A .„ „ „ „, „ p 

• ... -m x • • j- j • i ■ in According to another aspect of the present invention, in 

instruction 12. Thus, the coprocessor is interfaced with the Qrder tQ ^ c ocessor mstructio n for a speci fic 

host processor under the control of the interface signals INF ^ ^ performing operations for normal control 

and then completes decoding of the command 12 from the ^ ^ host Qr ^ aQ , p . f ^ 

thTcommTnd r i2 coprocessor executes fetched instruc , ion is a iaslniClioo . If ^ the 

35 host processor predecodes the fetched instruction during the 

Due to the execution of the command 12* associated with IF stage . Thenj lhe nost processor issues a coprocessor 

the instruction 12 in cycle T4, the instruction pipeline has to command corresponding to the fetched instruction in the 

be stalled for one clock cycle. Hence, the execution stage of ID/MEM stage of the instruction. The coprocessor then 

the instruction 13 should be suspended for one cycle and then decodes the coprocessor command in the ID/MEM stage, 

executed in cycle T5. The coprocessor decodes the com- ^ and executes a coprocessor operation designated by the 

mand 13* corresponding to the instruction 13 during cycle T5, coprocessor command in the EX stage of the instruction, 

and in the next cycle T6 the command 13' is executed by the llie copr ocessor provides the host processor with coproces- 

coprocessor. ^ s t a t us d a i a a ft er tne execution of the coprocessor opera- 

Thus, the pipeline stalling results when the respective tion in the EX stage. Then, the host processor evaluates the 
coprocessor commands 12' and 13' are decoded in the same 45 coprocessor status data to provide for a next conditional 
clock cycles as the corresponding coprocessor instructions branch instruction. 
12 and 13 are executed. Such pipeline stalling behaves like 
a bottleneck in information exchanges between a host pro- 
cessor and a coprocessor, causing degradations in computing A more complete appreciation of the present invention, 
speed and system performance. 5Q and many of the attendant advantages thereof, will become 

readily apparent as the same becomes better understood by 

SUMMARY OF THE INVENTION reference to the following detailed description when con- 

It^is-an-obj ect-of- th e-pre se nt-inve ntion to provide a sidered in conjunction with the accompanying drawings in 

low-poweMow-cost,.hig sys- which like reference symbols indicate the same or similar 

tern suitablOPJZmultimedia~a]^^ an 5S components, wherein: 

improve^^host-coprocessOr~system=on-a-chip (SOC) per- FIGS. 1A and IB illustrate the division of an operation of 

forming pipelined~operati6nsT an instruction into multiple suboperations; 

ClLis-another„obiect-oLthe^present-in ^ntionao- r5r^ide^a FIG. 2 is a timing diagram illustrating an example of 

host-copjxx^ pipelined mode operations of a typical RISC-based host- 

<c essoFinteTfal^~scheme7 ^ 60 coprocessor system; 

It is still another object of the present invention to provide FIG. 3 is a block diagram illustrating a preferred embodi- 

a method for accomplishing effective interfaces between a ment of an SOC type of host -coprocessor system according 

host processor performing pipelined operations and at least to the present invention; 

one coprocessor on a single chip. FIG. 4 is a timing diagram illustrating an example of 

These and other objects, features and advantages of the 65 pipelined mode operations of the system of FIG. 3; 

present invention are provided by a pipelined microproces- FIG. 5 is a block diagram illustrating a preferred embodi- 

sor which fetches an instruction, predecodes the fetched ment of the host-coprocessor system of FIG. 3; 



BRIEF DESCRIPTION OF THE DRAWINGS 
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FIG. 6 is a timing diagram illustrating a master clock ^fiinctions. The system 10 also includes several internal buses 

signal and internal clock signals used in the host- 38 and 40a— 4(k/. The system 10 is connected to a program 

coprocessor system of FIGS. 3 and 5; memory 34 such as a ROM (read-only memory) and to a 

FIG. 7 is a detailed block diagram of a preferred embodi- data mena ° r y 36 ^ 35 a DR ^ M (dynamic random access 

ment of the coprocessor interface unit in FIG. 5; 5 memory) through the buses 38 and 40<M0rf. The program 

™™ , ox, , an ° data memories 34 and 36 store instructions and data 

FIGS. 8A and SB are detailed circuit diagrams of pre- items> respeclively . ^ program memory 34 may be incor . 

ferred embodiments of a latch clock generator logic and an p0 rated within the system 10. The system 10 further includes 

interface signal generator logic, respectively; an internal clock generation circuit 42 which receives an 

FIG. 9 is a timing diagram illustrating an execution of a 1Q externally applied clock signal (a so-called master clock) 

coprocessor instruction in the system of FIG. 5; and CLK and generates two internal clock signals <|>CLK1 and 

FIG. 10 is a timing diagram illustrating an execution of a * CLK2 synchronized with the external clock signal CLK. A 

data transfer instruction in the system of FIG. 5. timin S dia S ram lllus ^ in f lne external and internal clock 

signals is shown in FIG. 6. 

DETAILED DESCRIPTION OF PREFERRED F °r tne purpose of explanation, it is assumed that the host 

EMBODIMENTS 35 processor 30 is an 8-bit ultra-low power embedded micro- 

processor which adopts a register-memory Harvard RISC 
The present invention will be described in detail with architecture. The host processor 30 has a separate data 
reference to the accompanying drawings. In the following memory address space and a program memory address space 
description, specific details are set forth in order to provide since it adopts such a Harvard architecture. The host pro- 
a more thorough understanding of the invention. However, 20 cessor 30 is also assumed to have sixteen general purpose 
the invention may be practiced without such particulars. In registers, eleven special purpose registers, and a 32-level 
some instances, well known elements have not been shown hardware stack, which are not shown in FIG. 3. In addition, 
or described to avoid unnecessarily obscuring the present tne maximum of the program memory address space is 
invention. Accordingly, the specification and drawings are to preferred to be 1M words (20-bit program address lines) and 
be regarded as illustrative, rather than restrictive. 25 the maximum of the data memory address space is prefer- 

A <u~ . . r . . A c . 4 r . ably 64K bytes (16-bit data address lines). 'The coprocessor 

An efficient and powerful interfacing architecture of the 32 7 prefcr J bly v a 16 . bil fixed .point DSP coprocessor, which 
hos -coprocessor system of the present invention may be be Ued tQ cost . sensitive low . end multimedia DSP 

applied to a low-power RISC microprocessor. The efficiency a ppU cati ons. The host processor 30 is pipelined and has 3 
and flexibility of the interfacing architecture are also impor- st fof , j lined operatio[li such ^ IF (Instruction 
tant to microprocessors wilh a relatively low bit W1 dth(e.g Fetch)> lDMEM ( i nstruction Decode and Data Memory 
8 or 16 bits) which are commonly used for products of Access)> and EX (Execut ion) stages. 

low-cost markets, such as consumer electronic market and r- f — * * * • c * u j c lL — 

to market sta ge ran~instruc tion-is-fetched~from~the program 

^ " <^memoiy~34~and~e arly~decoding~(cj3^de 

A host processor (or microprocessor) according to the 35cms tniction-is-carried-out^The-host- processor- 30-checks-> 
present invention has a set of generic coprocessor instruc- ^h^^@e]^d r i^ 

tions on its instruction map and a set of core pin signals for ,before~the-fetc1^d~ihs^ into" instruction 

performing interfaces with at least one coprocessor. The ^gisters^IK)^^ In 
genenc coprocessor instructions are renamed to specific ID/MEM stage, the fetched instruction is decoded and ALU 
coprocessor commands, depending on which ^ (arithmetic and logic unit) operands are latched into ALU 
coprocessor(s), such as DSP umt(s) or floating-point registers (not shown) of the host processor 30. Data memory 
processors), is interfaced with the host processor. accesses, hardware stack accesses, special purpose registers 

When a coprocessor instruction is fetchedand predecoded accesses, and general-purpose registers accesses are also 
iby^lhfQii^pjpc^ iss^ej^ppropriate' performed in ID/MEM stage. In EX stage, ALU operations 

^omm^aVlhTib^h c^proccssor_mterface^i^^ls"t^j3e % 45 and write-back operations into the general purpose registers 
\rapj^essor(s)rand~thT^processor(s)~perfo"rms"tosfe3^ occur, 
ignatecrby the>ppropriate_commands.^ Hence, the WhejOIf^^ 

coprocessor(s) interfacing with the host processor is passive coprocessor ^mstructidn, the host processor, 30,_ during 
in a sense that it does not have its own program memory. MD/MEM^riodT^prc^i^ 32 with several 

Pipeline mode operations of the coprocessors) are synchro- 50 interface signals corresponding to the fetched coprocessor 
nized with those of the host processor in a specific manner. instruction, such as A (e.g., 12 bits), B (e.g., 1 bit), C (e.g., 
Such a synchronization can prevent resource conflicts i bit), and D (e. g., 1 bit ).cSignal-A-is-an~immediate~yjlue^ 
between the host processor and the coprocessor(s) in their signarth^indicates axoprocessor_command.or-a-coproces-3 
accessing resources, such as data memory. Such features /-^scO^^^ 
relating to the synchronization will be described in detail 55 processor jO~anti~thXc^oc^^ 
below. fec^felhe^g^A^ 

A ... 'B-is-active; or as a^procesior register address when signal 

Microprocessor Architecture CisacTiv^. When data is transmitted from the host processor 

f In FIG. 3, a preferred embodiment of an SOC (system- 30 to the coprocessor 32 (called a CO_WRITE operation), 
on-a-chip) type host-coprocessor system according to the 60 signal D becomes inactive (e.g., high). When data is trans- 
* present invention is illustrated in block diagram form. An mitted from the coprocessor 32 to the host processor 30 
SOC type host-coprocessor system 10 includes a host pro- (called a CO_READ operation), the signal D becomes 
cessor 30, such as a general-purpose microcontroller or active (e.g., low). The data transmission between the host 
microprocessor, and a coprocessor 32, such as a digital processor 30 and the coprocessor 32 is carried out via their 
signal^ processing (DSP) unit or a floating-point processing 65 common buses 38, 406 and 40e. 
unil£lTieih^ processor-3fr pe,rfa The coprocessor 32 also provides, during the EX period 

s/ Ctions^ndltheicoprpcessorf of the host processor 30, with status flag data E (e.g., 3 bits) 
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depending on the execution of the coprocessor instruction. signals based on predecoded results of the coprocessor 
The host processor 30 then utilizes the status flag data E for instruction 13. The host processor 30 also issues a cop re- 
generating conditional instructions for its next operations cessor command 12' by providing the coprocessor 32 with 
without additional coprocessor access operation. This will interface signals INF (i.e., A, B, C and D) depending on 
be described in detail later. 5 predecoded results of the instruction 12. During cycle T2, the 

coprocessor 32 decodes the command 12'. 

Coprocessor Interface of Host Processor , , ™ . . , , 

r In cycle T3, the command 12 is executed by the copro- 

The interface between a host processor and a coprocessor cessor 32 and the host processor 30 issues a coprocessor 

requires the following aspects: (1) Synchronization of command 13' by providing the coprocessor 32 with interface 

executions of the host processor and the coprocessor; (2) 10 signals INF (i.e., A, B, C and D) on the basis of the 

Commands issued to the coprocessor; (3) Data transfer predecoding results of the instruction 13. The coprocessor 32 

between the host processor and the coprocessor; and (4) then decodes the command 13' and executes the command 

Flagging of coprocessor statuses to the host processor. The 12'. In cycle T4, the command 13* is executed by the 

synchronization scheme between multiple processors should coprocessor 32. 

be carefully designed to prevent resource conflicts, false data 35 Even though FIG. 4 shows only three subsequent 

dependency, and deadlocks. A handshake scheme (request/ instructions, the diagram may similarly continue in both 

acknowledge) or a mailbox scheme is a simple solution for directions prior to the instruction II and after the instruction 

the synchronization, especially when the processors are 13, 

autonomous. However the processors with such a synchro- M described above, the predecoding of instructions in IF 

nization scheme suffer from lack of efficiency and hardware 20 makeg ^ tQ ^ coprocessQr com . 

overhead mands XT and 13' in cycles T2 and T3 (i.e., in the ID/MEM 

Ahost-coprocessor system according to the present inven- st ages of the corresponding instructions 12 and 13). That is, 

tion may relieve such synchronization problems because the host processor's ID/MEM and EX stages are completely 

coprocessor(s) in the host-coprocessor system is passive (or replaced by the corresponding stages of the coprocessor. As 

non-autonomous) in a sense that it performs only tasks a result, the pipeline stalling does not occur, thereby improv- 

designated and at a certain time(or cycle) designated, m g the operation speed. 

respectively, by the host processor 30. Ifl afl instruction map of the host p roce ssor 30 of the 
FIG. 4 is a timing diagram illustrating pipelined execu- present invention, some instruction space is reserved for 
tions of three subsequent instructions within the system of 3Q coprocessor interface instructions shown in Table 1. 
FIG. 3. A coprocessor instruction is identified and prede- 
coded in an IF period. During an ID/MEM period, then, the TABLE 1 

host processor 30 provides the coprocessor 32 with a plu- 

rality of coprocessor interface signals A, B, C and D based Coprocessor interface instructions 

on information predecoded from the coprocessor instruction. „ _ . „ . _ . . 

o , , r«j j j j* 35 Mnemonic Opl Op2 Descnption 

Such a scheme of identifying and predecoding a coprocessor l _ r 

instruction will be described in detail below. 

For the purpose of explanation, the first instruction II is 
assumed to be a host processor instruction for an execution 
of a host processor operation, and the other instructions 12 
and 13 coprocessor instructions for executions of coproces- 
sor operations. Thus, the first instruction II is only executed 
by the host processor 30 without coprocessor interfacing, 
but the second and third instructions 12 and 13 are executed 

by the coprocessor 32 responsive to coprocessor commands 45 As mentioned earlier, a coprocessor instruction is fetched 

12' and 13' corresponding to the instructions 12 and 13, and predecoded by the host processor 30 in IF stage, and the 

respectively. The coprocessor commands 12' and 13' are host processor 30 Issues a coprocessor command to the 

loaded on the coprocessor interface signals INF (i.e., A, B, coprocessor 32 through the interface signals A 

C and D). The interface signals INF are issued by the host (COMMAND/ADDRESS), B (COMMAND 

processor 30 depending on predecoded results of the copro- 50 IDENTIFICATION), C (ADDRESS IDENTIFICATION) 

cessor instructions 12 and 13. and D (READ/WRITE IDENTIFICATION) in ID/MEM 

With reference to FIG. 4, the first instruction II is fetched stage, 

during cycle TO. That is, the instruction II is loaded from the In Table 1, "COP #imm: 12" instruction is to request that 

program memory 34 into the host processor 30. The host the coprocessor 32 perform a specific operation designated 

processor 30 then checks whether the fetched instruction II 55 by the immediate value (#imm: 12). A timing diagram for 

is a coprocessor instruction or not. In the next cycle Tl, if execution of COP #imm:12 instruction is illustrated in FIG. 

the fetched instruction II is not a coprocessor instruction, the 9 . 

instruction II is decoded by the host processor 30, At the with reference to FIG. 9, if a COP #imm: 12 instruction 

same time, the next instruction 12 is fetched and determined is fetched in IF stage, then a 12-bit immediate value (#imm: 

whether it is a coprocessor instruction. The host processor 60 12) is loaded on a command/address signal A, accompany- 

30 then predecodes the instruction 12 when it is a coproces- i n g a command identification signal B being active (for 

sor instruction, and produces a plurality of predecoding example, low) in ID/MEM stage, to request the coprocessor 

signals DCOP, DCLDW and DCLDR (shown in FIG. 5). 32 to perform designated operations. Then, the 12-bit imme- 

In cycle T2, the host processor instruction II is executed, diate value (#imm: 12) is interpreted by the coprocessor 32. 

and during the same cycle the next instruction 13 is fetched. 65 By arranging the 12-bit immediate field, a set of instructions 

If the instruction 13 is identified as a coprocessor instruction, for the coprocessor 32 is determined. In other words, the 

the host processor 30 produces a plurality of predecoding host processor 30 provides the coprocessor with a set of 
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generic coprocessor instructions, which is specific to the 
coprocessor. A specific coprocessor instruction set can differ 
from a coprocessor to another. In EX stage, the coprocessor 
32 executes the COP#imm: 12 instruction and then provides 
its status data E to the host processor 30. The host processor 
30 can use the data E, along with status flags of the host 
processor 30 itself, for generating a next conditional branch 
instruction. 

In Table 1, "CLD Reg, #imm:8" instruction and "CLD 
#imm:8, Reg" instruction are data exchange instructions 
between the host processor 30 and the coprocessor 32. 
"Reg" is a host processor register number, and "#imm:8" 
denotes an 8-bit immediate value that may be a coprocessor 
register address. 

FIG. 10 is a timing diagram illustrating the execution of 
the data exchange instructions. CLD Reg, #imm:8 instruc- 
tion is a CO__READ instruction, and CLD #imm:8, Reg 
instruction is a CO_WRITE instruction. If a CO_READ 
instruction is fetched, then the immediate value (#imm:8) is 
loaded on a command/address signal A, accompanying an 
address identification signal C being active (for example, 
low) in ID/MEM stage. When the signal C is active and a 
command identification signal B is inactive, 8-bit immediate 
value (#imm:8) is identified as a coprocessor register 
address. At this time, a read/write identification signal D 
remains high, so that data from the coprocessor 32 is 
transferred via data buses to the host processor 30. 

Similarly, when a CO„WRITED instruction is fetched, 
the immediate value (#imm:8) is also loaded on the 
command/address signal A, accompanying the address iden- 
tification signal C being active in ID/MEM stage. When the 
signal C active, the 8-bit immediate value (#imm:8) is 
identified as a coprocessor register address. At this time, the 
read/write identification signal D remains low, so that data 
from the host processor 30 is transferred via data buses to the 
coprocessor 32. 

In Table 1, "JMP E, label", "CALL E, label" and "LNK 
E, label" instructions are conditional branch instructions. 
After an execution of a coprocessor instruction in EX stage 
of a pipelined operation, the host processor 30 is provided 
with coprocessor status data E (e.g., sign, overflow, and zero 
flags) from the coprocessor 32. The host processor 30 can 
utilize the status data E for generating a next conditional 
branch instruction, such as JMP, CALL, or LNK instruction, 
without additional coprocessor access operation, thereby 
considerably decreasing execution time of conditional 
branch instructions. 

Referring to FIG, 5, there is a detailed block diagram 
illustrating a preferred embodiment of the host processor 30 
and the coprocessor 32 configured to operate as a three-stage 
pipeline. In the host processor 30, an instruction latch 50, a 
coprocessor instruction identifier 52, and a program counter 
70 serve as an instruction fetch circuit to fetch instructions. 
The program counter 70 indicates a location of the program 
memory 34 where an instruction to be fetched is stored. The 
instruction latch 50 latches the instruction fetched from the 
program memory 34. The coprocessor instruction identifier 
52 checks whether the fetched instruction is a coprocessor 
instruction. If the fetched instruction is identified as a 
coprocessor instruction, the coprocessor instruction identi- 
fier 52 predecodes the fetched instruction and generates a 
plurality of predecoding signals DCOP, DCLDW and 
DCLDR. 

The host processor 30 also includes, for operations in 
ID/MEM stages, an instruction register 54, a host instruction 
decoder 56, host source registers 58, and a coprocessor 
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interface unit 154. The instruction register 54 is connected to 
the instruction latch 50. The host instruction decoder 56 is 
connected to the instruction register 54 and the data memory 
36. The host source registers 58 are connected to the host 

5 instruction decoder 56 and the data memory 36. The copro- 
cessor interface unit 154 is connected to the coprocessor 
instruction identifier 52. The coprocessor interface unit 154 
generates a plurality of coprocessor interface signals A, B, C 
and D in response to the predecoding signals DCOP, 

10 DCLDW and DCLDR. 

A host instruction execute stage (or circuit) of the host 
processor 30 comprises a host execution unit 62 connected 
to the data memory 36 and the host source registers 58, a 
host destination register 64 connected to both the data 

]5 memory 36 and the host execution unit 62, a host status 
register 66 connected to the host execution unit 62, and a 
branch condition evaluation unit 68 connected to the host 
status register 66, the coprocessor 32 and the program 
counter 70. The branch condition evaluation unit 68 checks 

20 statuses of both the host processor 30 and the coprocessor 
32. 

^^e^coprocessor-32jcgmpTi ses a coprocessorinstruction 
Cjdecode/memory accessxircui£ajKTa ^pioc^s^Stnictioti 
^executes-circuitrj The coprocessor instruction decode/ 

25 memory access circuit is connected to the instruction fetch 
circuit and the data memory 36. The coprocessor instruction 
decode/memory access circuit comprises a coprocessor 
instruction decoder 156 connected to the coprocessor inter- 
face unit 154 and the data memory 36, and coprocessor 

30 source registers 158 connected to the coprocessor instruction 
decoder 156 and the data memory 36. The coprocessor 
instruction execute circuit comprises a coprocessor execu- 
tion unit 162 connected to the coprocessor source registers 
158 and the data memory 36, a coprocessor destination 

35 register 164 connected to the coprocessor execution unit 162 
and the data memory 36, and a coprocessor status register 
166 connected to the coprocessor execution unit 162 and the 
branch condition evaluation unit 68 of the host processor 30. 
During IF period, an instruction indicated by the program 

40 counter 70 is fetched from the program memory 34 and 
latched by the instruction latch 50 synchronized with a first 
internal clock signal (|>CLK1 (referring to FIG. 6). Output of 
the instruction latch 50 is provided to the coprocessor 
instruction identifier 52. The coprocessor instruction iden- 

45 tifier 52 checks whether the fetched instruction is a copro- 
cessor instruction. If the fetched instruction is identified as 
a coprocessor instruction, the coprocessor instruction iden- 
tifier 52 predecodes the coprocessor instruction and gener- 
ates predecoding signals DCOP, DCLDW, and DCLDR 

50 based on the predecoded results. Signal DCOP becomes 
active when a COP #imm:12 instruction is fetched, signal 
DCLDW becomes active when a CLD #imm:8, Reg instruc- 
tion is fetched, and signal DCLDR becomes active when a 
CLD Reg, #imm:8 instruction is fetched. 

55 In ID/MEM stage, the instruction register 54 is loaded 
with the output of the instruction latch 50 (i.e., a fetched 
instruction), synchronizing with a second internal clock 
signal <|>CLK2 (referring to FIG. 6). If the fetched instruction 
is a host instruction, the host instruction decoder 56 decodes 

60 the host instruction. Then, it is ready to execute operations 
designated by the host instruction by means of accessing the 
data memory 36 and the host source registers 58, if neces- 
sary. If the fetched instruction is not a defined instruction (or 
it is a coprocessor instruction), the host processor 30 per- 

65 forms no operation. In this case, the coprocessor interface 
unit 154 receives the predecoding signals DCOP, DCLDW 
and DCLDR from the coprocessor instruction identifier 52 
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and issues a coprocessor command corresponding to the plied with coprocessor status data E (e.g., sign, overflow, 
fetched coprocessor instruction by generating the coproces- and zero flags) from the coprocessor status register 166. 
sor interface signals A, B, C and D. Then, the host processor 30 can use the status data for a next 
With reference to FIG. 7, a preferred embodiment of the conditional branch instruction, such as a jump (JMP), a 
coprocessor interface unit 154 comprises an interface logic- 5 subroutine call (CALL), or a link (LNK) instruction, without 
circuit 80 and four latches 82, 84, 86 and 88. The interface a ny coprocessor access operation. This, along with the 
logic circuit 80 generates the interface signals B, C and D interface signals generated in ID/MEM period, makes it 
and latch clock signals *A, 4»B, and $D in response to the possible to synchronize pipelined operations of the host 
predecoding signals DCOP, DCLDW and DCLDR from the 'sor 30 and the coprocessor 32 
coprocessor instruction identifier 52, Detailed circuit con- in , . , r , 
figurations of the interface logic circuit 80 are shown in 10 In a multi-processor system, data transfer between pro- 
FIGS. 8A and 8B. FIG. 8A illustrates a latch clock generator cessors is an important factor to determine efficiency of the 
logic and FIG. 8B illustrates an interface signal generator overa11 svstem - Suppose a processor receives a certain input 
logic. data stream. In order for the processor to share the data with 
In FIG. 8A showing a preferred embodiment of a latch otner processors, there should be an efficient mechanism to 
clock generator logic, an OR gate 202 has three inputs which 15 transfer the data between the processor and the others. Such 
are applied with the predecoding signals DCOP, DCLDW data transfers are accomplished by means of a single shared 
and DCLDR, respectively. An output of the OR gate 202 is data memory, such as data memory 36 in FIG. 3. The shared 
coupled to an input of an AND gate 204 whose other input data memorv in a multi-processor system has some inherent 
is provided with the second internal clock signal <()CLK2. problems such as data hazards and deadlocks. 
The AND gate 204 outputs a latch clock signal <|>A for the 20 The host-coprocessor system 10 of the present invention, 
latch 82. The latch clock signal +A is synchronized with the however, accesses the shared data memory 36 only at time 
second internal clock signal <()CLK2 (or CLK). The interface designated by the host processor 30. That is, during that 
signal A is generated only when one of the signals DCOP, time, the coprocessor 32 only accesses the memory, while 
DCLDW and DCLDR becomes active, preventing power the host processor 30 is guaranteed not to access the data 
consumption caused by frequent value changes of the signal 25 memory 36. Therefore, there is no contention over the 
A. Each of the latch clock signals 4>B, tf>C and <()D for the shared data memory 36. Another advantage of the host- 
latches 84, 86 and 88 has the same phase as that of the coprocessor system is that the coprocessor 32 can access the 
second internal clock signal <|)CLK2. data memory 36 in its own bandwidth. 

FIG. 8B shows a preferred embodiment of an interface 3Q Direct data transfers between the host processor 30 and 
signal generator logic comprising a first inverter 206, an OR the coprocessor 32 are performed only with respect to CLD 
gate 208, and a second inverter 210. The first inverter 206 instructions. CO_WRITE instructions (CLD #imm:8, Reg) 
receives the 25 predecoding signal DCOP and outputs the put data of a general-purpose register of the host processor 
interface signal B. The OR gate 208 has two inputs applied 30 on data buses and issue the address (#imm:8) of a 
with the predecoding signals DCLDW and DCLDR, 35 coprocessor internal register on the signal A accompanying 
respectively, and an output coupled to the second inverter the signal C active (Low) and the signal D active (Low). 
210. The second inverter 210 outputs the interface signal C. CO_READ instructions (CLD Reg, #imm:8) work 
The predecoding signal DCLDW is directly used as inter- similarly, except that data of the coprocessor internal register 
face signal D. addressed by a 8-bit immediate value is read into a general- 
Referring again to FIG. 7, the latches 82, 84, 86, and 88 40 purpose register through data buses during the signal C 
latch the interface signals A, B, 30 C, and D, respectively, active and the signal D deactivated (High), 
synchronizing with the latch clock signals (|>A, (|>B, <(>C, and The coprocessor 32 is passive. That is, it does not have its 
(j)D, respectively. The latch 82 directly latches the immediate own programs. To perform a branch operation according to 
value (such as a coprocessor register address or a coproces- a status of the coprocessor 32 (which may be an outcome of 
sor instruction) loaded on an instruction from the instruction 45 a coprocessor instruction execution), conditional branch 
latch 50. instructions (JMP, CALL and LNK) of the host processor 30 
Turning back to FIG. 5, in ID/MEM stage, the coproces- directly refer to values of E<2:0>signal containing status 
sor instruction decoder 156 decodes the interface signals data of the coprocessor 32. 

(i.e., the coprocessor commands) from the coprocessor inter- ^I^^mstrucUbns^imThUle ^noJo^ pxontroLoverhead^ 

face unit 154 and provides for executions of the coprocessor 50 are-imp^a^TgrpSP "programs, becTu^^h^rwisFlh^jrZ^ 

instructions, if necessary, by means of accessing the data ^spend z a~sigmficanUport 

memory 36 and the coprocessor source registers 158. controlsflg^ucelw 

For EX stage, for example, the host execution unit 62 such provides -JNZD-instnocjio^ 

as an ALU (arithmetic and logic unit), the host status register pergrms^theliecr^ 

66 to store host processor status data (e.g., sign, carry, 55 ir^smgle^ihl^ction-vWth^d 

overflow, zero and other flags), and the branch condition JNZD-instmction-is-just-l-cyclerwhich^^assigned4o the 

evaluation unit 68 to check the conditional flags of the host mstm ction^itselfrSince'the^ any of 

status register 66 are provided for host processor 30. A general^p.urpose^egist^rioop^nestings are alscrpossiBleT~" 

certain host destination register 64 in host register file is also ~~~~ 

involved in EX stage. Similarly, a coprocessor execution 6 o Coprocessor Implementation 

unit 162 (such as a specific arithmetic-function unit and a A 16-bit fixed-point DSP processor is,* for example, imple- 

normal arithmetic-function unit), a certain coprocessor des- mented as the coprocessor 32 for low-end DSP applications, 

tination register in a coprocessor register file, and a copro- It is designed as one of the DSP coprocessor engines for the 

cessor status register 166 are provided for the coprocessor host processor 30, which targets cost-sensitive and low-end 

32- 65 multimedia DSP applications. Generic coprocessor instruc- 

During EX stage, after an execution of a coprocessor tions are renamed according to intended operations on the 

instruction, the branch condition evaluation unit 68 is sup- coprocessor 32, including DSP data type and DSP address- 
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ing mode. The coprocessor comprises four units: MU includes efficient, low cost, and speedy pipelined operations. 

(Multiplier Unit), AU (Arithmetic Unit), RPU (RAM Pointer In addition, since the coprocessor is synchronized with the 

Unit), and IU (Interface Unit). MU (Multiplier Unit) basi- host processor, little or no extra communication overhead is 

cally includes a 24-bit by 16-bit parallel multiplier and a incurred. Further, since the coprocessor can be regarded as 

24-bit adder for multiply-and-accumulate (MAC) opera- 5 a peripheral block to the host processor, only the coprocessor 

tions. Hence, 16-bit by 16-bit MAC operations are per- (not the host processor) needs to be revised when it is 

formed in two cycles in the coprocessor 32. AU performs required to modify or upgrade a specific function of the 

16-bit arithmetic and shift operations for DSP. RPU of the coprocessor. 

coprocessor comprises three data memory pointers and two It is understood that various other modifications will be 

control blocks for pointer modulo calculations. The pointers 3Q apparent to and can be readily made by those skilled in the 

are used for accessing the data memory 36 for a 16-bit data art without departing from the scope and spirit of this 

operand. Since two 16-bit data operands can be fetched invention. Accordingly, it is not intended that the scope of 

simultaneously in a single cycle for MAC operation, the data the claims appended hereto be limited to the description as 

memory 36 should be partitioned into two parts: X and Y set forth herein, but rather that the claims be construed as 

memory. Note that the data memory 36 that the coprocessor encompassing all the features of patentable novelty that 

32 accesses through the pointers is the same data memory reside in the present invention, including all features that 

that the host processor 30 accesses (see FIG. 3). would be treated as equivalents thereof by those skilled in 

As explained above, there is no contention over the data the art which this invention pertains, 

memory 36 because the coprocessor 32 accesses the data What is claimed is: 

memory 36 at ID/MEM stage of the corresponding copro- 20 1. A data processing system comprising: 

cessor instruction. IU is for communications between the a program memory for storing instructions: 

host processor 30 and coprocessor 32. It decodes the copro- a nost processor for performing pipelined operations and 

cessor interface signals from the host processor and controls f or predecoding an instruction fetched from the pro- 

the other units, according to the decoding result. gr am memory during a fetch cycle of the instruction 

With the coprocessor, an N-tap real FIR filter and LMS 2 s and for issuing a command corresponding to the 

adaptive filter can be implemented with 3N+X cycles and instruction; and 

8N+X cycles, where X is additional cycles for input/output. at i east one coprocessor for performing other pipelined 

The host processor 30 and the coprocessor 32 can be operations and for decoding the command received 

implemented using 0.5 fan double metal CMOS process. Die from me host processor during a decoding cycle of the 

sizes of the host processor and coprocessor are about 0.85 30 instruction and for executing the decoded command 

and 0.6 mm , respectively. Power consumptions of the host during an execution cycle of the instruction, wherein 

processor and the coprocessor are 0.10 mW and 0.24 mW tne host processor generates a second predecoding 

per MIPS at 3V, respectively. And they can run up to 20 s j gna i based on predecoding the instruction fetched 

MHz (about 20 MIPS). fr 0m t he program memory, the second predecoding 

The host-coprocessor SOC architecture of the present 35 signal includes commands for transferring data from 

invention can also be used for digital caller identification the coprocessor to the host processor, 

applications. By using host-plus-coprocessor platform, total 2. The data processing system of claim 1, wherein the host 

number of components can be reduced to a bare minimum, processor predecodes an instruction fetched from the pro- 

and total power consumption which is a key factor to gram memory when the host processor identifies the instruc- 

telephony applications, can be minimized. DTMF (Dual 40 tion as a coprocessor instruction for the coprocessor. 

Tone Multiple Frequency) generator, FSK (Frequency Shift 3. The data processing system of claim 1, wherein the host 

Keying) demodulator for ID extraction, and CAS (CPE Alert processor generates coprocessor interface signals when 

Signal) detector part of the caller ID algorithm are mapped an. instruction fetched from the program memory is identi- 

on the coprocessor and result in 0.2 MIPS, 0.7 MIPS and 1.6 fied as a coprocessor instruction, and transfers the command 

MIPS, respectively. Program memories required for DTMF, 45 to the coprocessor with the coprocessor interface signals. 

FSK, and CAS are 90, 580, and 512 bytes, respectively. 4. The data processing system of claim 1, wherein the 

Since each of DSP algorithms runs exclusively, MIPS coprocessor provides the host processor with coprocessor 

required for the DSP algorithms is 1.6 MIPS. Asystem clock status data representing status of the coprocessor after 

for the host -coprocessor system runs at 5 MHz, which executing the command received from the host processor, 

provides enough computing power for DSP operations as 50 5. The data processing system of claim 1, wherein the 

well as microcontroller operations. Hence the power con- coprocessor is adapted to execute digital signal processing 

sumption of the host-coprocessor system for the digital (DSP) functions. 

caller ID application will be well below 1.7 mW at 3V, 25 6. The data processing system of claim 1, further corn- 
degrees, typical process condition. As an analog front end, prising a data memory connected to the host processor and 
a 10-bit ADC (Analog to Digital Converter) can be used, 55 the coprocessor, wherein the host processor and the copro- 
whose power consumption is about 1.5 mW at 3 V, 25 cessor access the data memory at different times designated 
degrees, typical process condition. Hence our SOC solution by the host processor. 

for a microcontroller with the digital caller ID function 7. The data processing system of claim 1, wherein the host 

estimates to consume less than 4.0 mW at 3V, 25 degrees, processor generates a first predecoding signal based on 

typical process condition. eo predecoding the instruction fetched from the program 

As described above, a microprocessor or microcontroller memory, wherein the first predecoding signal includes 

according to the present invention has an efficient and instructions requesting the coprocessor to perform a specific 

powerful coprocessor interface architecture. This interface operation designated by an immediate value loaded on the 

architecture targets low-power RISC microprocessors of a first predecoding signal. 

relatively low bit width. The heart of the architecture is 65 8. The data processing system of claim 4, wherein the host 

"efEciency and flexibility" in designs. Advantageously, the processor receives the coprocessor status data from the 

host-coprocessor SOC architecture of the present invention coprocessor, the host processor for generating at least one 
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conditional branch instruction which includes data for per- 
forming at least one next operation of the pipelined opera- 
tions. 

9. The data processing system of claim 1, wherein the 
second predecoding signal includes at least one host pro- 
cessor register number and at least one coprocessor register 
address, whereby the data stored in at least one register 
designated by the at least one coprocessor register address is 
transferred to at least one register designated by the at least 
one host register number. 

10. The data processing system of claim 8, wherein the 
host processor generates a third predecoding signal based on 
predecoding the instruction fetched from the program 
memory, the third precoding signal includes commands for 
transferring data from the host processor to the coprocessor. 

11. The data processing system of claim 10, wherein the 
third predecoding signal includes at least one host processor 
register number and at least one coprocessor register 
address, whereby the data stored in at least one register 
designated by the at least one host register number is 
transferred to at least one register designated by the at least 
one coprocessor register address. 

12. A data processing system comprising: 

a program memory for storing instructions; 

a host processor for performing pipelined operations on 
instructions received in sequence from the program 
memory, wherein the host processor fetches, decodes, 
and executes a host instruction, predecodes a copro- 
cessor instruction during a fetch cycle of the coproces- 
sor instruction, and issues interface signals based on 
results of predecoding the coprocessor instruction; 

a coprocessor for performing pipelined operations based 
on the coprocessor instruction, wherein the- 
coprocessor decodes the interface signals received 
from the host processor during a decoding cycle of the 
coprocessor instruction and executes the coprocessor 
instruction responding to results of decoding the inter- 
face signals; 

a data memory for storing data, the data memory being 
connected to the host processor and the coprocessor; 
and 

an internal clock synchronized with a master clock for 
generating internal clock signals to the host processor 
for the pipelined operations. 

13. The data processing system of claim 12, wherein the 
host processor includes: 

an instruction fetch circuit for fetching the instructions in 
sequence from the program memory and for generating 
predecoding signals based on the instructions when the 
instructions are coprocessor instructions; 

a host instruction decode/memory access circuit con- 
nected to the instruction fetch circuit and the data 
memory, for decoding the instructions fetched by the 
instruction fetch circuit when the fetched instructions 
are instructions for control functions, wherein the host 
instruction decode/memory access circuit generates the 
interface signals to the coprocessor; and 

a host instruction execute circuit connected to the host 
instruction decode/memory access circuit and the data 
memory, for executing the instructions for control 
functions decoded by the host instruction decode/ 
memory access circuit, wherein the host instruction 
execute circuit generates conditional branch instruc- 
tions to the instruction fetch circuit responding to status 
data of the host processor and status data of the 
coprocessor received from the coprocessor. 
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14. The data processing system of claim 13, wherein the 
instruction fetch circuit includes: 

at least one instruction latch for latching at least one 
instruction from the program memory; and 

a coprocessor instruction identifier for determining 
whether the instruction fetched in the at least one 
instruction latch is a coprocessor instruction, for pre- 
decoding the fetched instruction identified as the copro- 
cessor instruction, and for generating the predecoding 
signals to the host instruction decode/memory access 
circuit. 

15. The data processing system of claim 13, wherein the 
host instruction decode/memory access circuit includes:. 

a host instruction decoder for decoding the fetched 
instruction received from the instruction fetch circuit; 

at least one host source register for receiving decoded 
results from the host instruction decoder and for gen- 
erating data for executing the fetched instruction; and 

a coprocessor interface unit for receiving the predecoding 
signals from the instruction fetch circuit and for gen- 
erating the interface signals to the coprocessor. 

16. The data processing system of claim 15, further 
including an instruction register for receiving the fetched 
instruction from the instruction fetch circuit and providing 
the fetched instruction to the host instruction decoder, 
wherein the instruction register is synchronized with a 
second internal clock signal from the internal clock. 

17. The data processing system of claim 15, wherein the 
host instruction decoder provides the data memory with 
results of decoding the fetched instruction, whereby the data 
memory provides for data for executing the fetched instruc- 
tion. 

18. The data processing system of claim 15, wherein the 
host instruction execute circuit includes: 

a host execution unit for receiving the decoded results 
from the host instruction decode/memory access circuit 
and data from the data memory, and for executing the 
instructions for control functions responding to the 
decoded results and the data; 

a host status register connected to the host execution unit, 
for storing the status data of the host processor; and 

a branch condition evaluation unit for receiving the status 
data of host processor from the host status register and 
the status data of coprocessor from the coprocessor, and 
for generating the conditional branch instructions to the 
instruction fetch circuit. 

19. The data processing system of claim 15, wherein the 
coprocessor interface unit is synchronized with a second 
internal clock signal from the internal clock. 

20. The data processing system of claim 15, the copro- 
cessor interface unit includes: 

a latch clock generator logic for generating a plurality of 
latch clock signals in response to a second internal 
clock signal and the predecoding signals; 

an interface signal generator logic for generating the 
interface signals responding to the precoding signals; 
and 

a plurality of latches for latching the interface signals in 
synchronization with the latch clock signals. 

21. The data processing system of claim 20, wherein the 
latch clock generator logic includes: 

an OR gate having inputs receiving the predecoding 
signals, each of the inputs receiving each of the pre- 
decoding signals; 

an internal clock signal terminal for receiving a second 
internal clock signal from the internal clock and for 
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providing the at least one first latch with the second 
internal clock signal as at least a first latch clock signal; 
and 

an AND gate having a first input connecting to an output 
of the OR gate, a second input connecting to the 
internal clock signal terminal, and an output providing 
the at least one second latch with at least one second 
latch clock signal. 

22. The data processing system of claim 20, wherein the 
interface signal generator logic includes: 

a first inverter having an input receiving a first signal of 
the predecoding signals from the instruction fetch cir- 
cuit and an output providing the coprocessor with a first 
signal of the interface signals; 

an AND gate having at least first and second inputs 
receiving at least second and third signals of the 
predecoding signals, respectively, wherein the second 
signal of the predecoding signals is provided to the 
coprocessor as a second signal of the interface signals; 
and 

a second inverter for inverting output of the AND gate and 
for providing the coprocessor with the inverted signal 
as a third signal of the interface signals. 

23. The data processing system of claim 12, wherein the 
coprocessor includes: 

a coprocessor instruction decode/memory access circuit 
for receiving the interface signals from the host pro- 
cessor and for decoding the interface signals, wherein 
the coprocessor instruction decode/memory access cir- 
cuit provides the data memory with results of decoding 
the interface signals; and 

a coprocessor instruction execute circuit for receiving the 
decoded results from the coprocessor instruction 
decode/memory access circuit and data from the data 
memory designated by the decoded results, and for 
executing the coprocessor instruction in response to the 
data and the decoded results. 

24. The data processing system of claim 23, wherein the 
coprocessor instruction execute circuit includes: 

a coprocessor execution unit for receiving the data from 
the data memory and the decoded results from the 
coprocessor instruction decode/memory access circuit, 
and for executing the coprocessor instruction in 
response to the data and the decoded results; and 

a coprocessor status register connected to the coprocessor 
execution unit and the host processor, for storing the 
status data of the coprocessor and providing the status 
data to the host processor. 

25. A method for performing operations of pipelined 
instructions in sequence in a data processing system for 
performing host instructions for normal control functions 
and for performing coprocessor instructions for additional 
specific functions, the method comprising the steps of: 

(a) fetching an instruction from a program memory in a 
fetch stage of the instruction; 
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(b) determining during the fetch stage whether the fetched 
instruction is a coprocessor instruction for a specific 
function; 

(c) predecoding the fetched instruction during the fetch 
stage when the fetched instruction is the coprocessor 
instruction; 

(d) issuing a coprocessor command corresponding to the 
fetched instruction in a decode/memory access stage of 
the instruction, wherein the coprocessor command is 
based on predecoded results in the step (c); 

(e) decoding the coprocessor command during the 
decode/memory access stage; and 

(f) executing the coprocessor command as designated by 
decoded results of the step (e) at a time designated by 
a host processor of the data processing system, wherein 
during the time designated the host processor does not 
access the program memory shared with a coprocessor 
of the data processing system. 

26. The method of claim 25, further comprising the steps 



of: 

(g) decoding the instruction fetched in the step (a) in a 
decode/memory access stage of the instruction when 
the fetched instruction is a host instruction for normal 

25 control function; and 

(h) executing the host instruction in a execute stage of the 
instruction in response to decoded results of the step 

(g). 

27. The method of claim 26, further comprising the steps 
of: 

providing for a first status data representing status of the 
data processing system after performing the execution 
of the coprocessor instruction in the step (f); 
providing for a second status data representing status of 
the data processing system after performing the execu- 
tion of the host instruction in the step (h); and 
evaluating a next conditional branch instruction from the 
first and the second status data, wherein the next 
conditional branch instruction is used in a next fetch 
stage of a next instruction to be fetched from the 
program memory. 

28. The method of claim 25, further comprising the step 
of synchronizing the fetch stage and the decode/memory 
access stage with a first internal clock signal and a second 
internal clock signal, respectively. 

29. The method of claim 25, wherein the step (c) com- 
prises the step of generating a second predecoding signal 
based on predecoding the fetched instruction, the second 
predecoding signal comprises at least one host processor 
register number and at least one coprocessor register 
address, whereby the data stored in at least one register 
designated by the at least one coprocessor register address is 
transferred to at least one register designated by the at least 
one host register number. 
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